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LEARNING  BY  INDUCTIVE  INFERENCE 


R.  S.  Michalski 

University  of  Illinois 
Urbana,  Illinois  6l801 


SUMMARY.   The  paper  is  addressed  to  learning  processes  which 
employ  inductive  inference.  A  system  of  variable -valued  logic, 
called  VIg,  is  briefly  described  and  its  application  to  imple- 
menting inductive  learning  processes  is  discussed.   The  VL2  can 
be  characterized  as  a  'multi-valued  first  order  predicate  logic1. 
An  example  of  learning  by  a  computer  program  the  difference  between 
two  classes  of  objects  is  given. 


INTRODUCTION 

Learning  processes  can  be  generally  viewed  as  the  processes 
of  determining  and  representing  relationships  which  exist  among 
objects.   These  relationships  are  determined  and  represented 
within  the  system  which  learns  ('STUDENT')  using  a  source  of 
information  about  the  objects  ('TEACHER').   It  has  been  observed 
(e.g.,  Bongardl),  that  the  smaller  the  degree  of  STUDENT-oriented 
organization  of  information  which  the  TEACHER  provides,  the 
greater  must  be  the  complexity  of  the  STUDENT.   Consequently,  the 
learning  processes  can  be  classified  according  to  the  degree  of 
organization  of  information  provided  by  the  TEACHER.   Thus,  we 
can  distinguish,  e.g.,  learning  'by  being  born'  (innate  capabil- 
ities) or  being  designed  (the  greatest  organization  on  the  part 
of  the  TEACHER),  learning  by  being  programmed,  learning  from 
examples,  from  observation  (Vithout  teacher'),  learning  by 
'inspiration'.   In  this  paper  we  are  concerned  with  problems 
which  belong  to  the  area  of  'learning  from  examples*. 

Like  physical  processes  which  are  governed  by  a  law  of 
minimum  energy,  it  seems  (still  only  intuitively)  that  information 


processes,  thus  also  learning  processes,  may  be  governed  by  a 
corresponding  law  of  'minimum-complexity'  (or  'maximum- 
simplicity' ).   In  other  words,  information  processes  seem 
to  have  an  overall  tendency  to  achieve  given  information 
processing  goals  by  the  simplest  means  (which,  in  special  cases, 
just  means  the  minimum  number  of  operations).  An  evidence  of  the 
existence  of  such  a  tendency  in  the  area  of  human  literary 
expression  is  the  Zipf's  law. 2  It  seems  that  all  human  infor- 
mation processing  activities,  in  particular  scientific  activities, 
are  oriented  toward  determining  adequate  and,  at  the  same  time, 
simple  descriptions  or  explanations  of  surrounding  environment 
and  phenomena.   The  ability  to  create  the  simplest  descriptions, 
which  use  only  the  'most  significant'  concepts,  and  disregard  the 
'irrelevant  details',  is  highly  regarded  and  considered  an 
evidence  of  intelligence.   But  how  can  we  formally  define  such 
concepts  as  the  'simplest  description'.   How  can  we  create 
machines  which  have  the  ability  of  determining  such  descriptions? 

As  Banerji^  pertinently  observed,  a  simple  concept  for  one 
person  may  not  be  simple  for  another.  His  explanation  of  it  is 
that  'there  is  something  in  the  human  mind  which,  given  constant 
exposure  to  a  concept,  however  complicated,  makes  it  simple'. 
This  explanation  can  be  deepened  by  saying  that  a  seemingly 
complex  concept  becomes  simple  if  it  is  well  understood,  which, 
in  turn,  means  that  its  relationship  to  the  well-known  concepts 
has  been  clearly  established.   Therefore,  in  order  to  be  able  to 
define  a  measure  of  simplicity  of  descriptions,  two  requirements 
have  to  be  first  satisfied: 

(1)  A  language  in  which  descriptions  are  expressed  has  to  be 
assumed. 

(2)  A  measure  of  'semantic  equivalence'  of  descriptions  has  to 
be  established.   This  condition  is  necessary  because  for 
determining  the  'simplest  description'  of  whatever  we 
describe,  we  want  to  compare  only  descriptions  which  convey 
the  same  information  (i.e.,  which  are  semantically  equivalent). 

Having  satisfied  (l)  and  (2),  a  measure  of  simplicity  of 
descriptions  can  be  easily  formalized.   It  can  be,  e.g.,  a 
monotonically  decreasing  function  of  the  length  of  a  description 
(measured,  e.g.,  by  the  number  of  certain  assumed  constructs  of 
the  language  which  occur  in  the  description).   If  there  is  given 
a  'simplicity  function'  over  the  individual  constructs,  then  one 
can  consider  a  weighted  sum  of  constructs.   If  only  a  preference 
order  of  constructs  is  assumed,  then  one  could  use  the  lexico- 
graphic functional  defined  by  Michalski. 

In  this  paper  we  present  some  recent  results  from  our  work 
on  the  theory  and  computer  implementations  of  systems  which  can 


learn  the  'simplest  descriptions'  by  executing  an  inductive 
inference  process  ('inductive  learning'). 


LANGUAGE  FOR  EXPRESSING  DESCRIPTIONS:   SYSTEM  VL2 

The  formal  system  which  we  are  currently  developing  as  a 
tool  for  expressing  descriptions  and  implementing  inductive 
learning  is  a  variable-valued  logic  system  VL?.   This  system  is 
an  extension  of  the  system  VL^  described  by  Michalski.  >  5,6 
The  VLg  system  gives  a  sound  formal  basis  for  developing  an 
'algebra  of  descriptions'  which  would  enable  one,  for  example, 
to  build  descriptions,  to  simplify  them,  generalize  to  various 
degree,  to  compare  descriptions  of  individual  objects  or  classes 
of  objects,  to  infer  a  description  of  a  class  of  objects  from 
examples  of  objects  of  this  class,  etc. 

The  full  definition  of  the  system  VL2  is  not  yet  available. 
For  the  purpose  of  this  paper  we  will  briefly  and  informally 
describe  some*  of  the  concepts  of  the  system,  most  relevant  to 
our  subject. 

To  do  it  simply,  we  will  relate  our  description  of  the 
system  to  the  presently  widely  used  first  order  predicate  logic 
(FOPL) : 

1.  In  FOPL,  the  atomic  formulas  (k-ary  predicate  symbols 
followed  by  k  occurrences  of  variables,  function  forms  and/or 
constants)  are  assumed  to  be  binary  valued  (true  or  false). 
In  the  VLg,  these  formulas  (called  atomic  forms)  are  treated 
as  functions  which,  as  well  as  their  arguments,  range  over 
independent  domains.   These  domains  are  determined  as  most 
appropriate  for  the  interpretation  of  the  atomic  forms  and 
their  arguments,  or  the  problem  at  hand. 

2.  The  atomic  forms  occur  in  a  wff  of  VL2  (a  VL2  formula)  as 
parts  of  a  broader  concept  of  a  selector,  and  are  not, 
generally,  the  VLg  formulas  when  standing  alone  (except 
for  the  case  when  a  VLg  formula  reduces  to  a  FOPL  formula). 

3.  VL2  formulas  range  over  an  output  domain,  denoted  D,  which 

is  a  linearly  ordered  set  having  the  smallest  and  the  largest 
element . 


* 
In  the  full  definition  of  VLg  there  are  more  operations  than 

those  described  here  and  the  concept  of  selector  has  a  broader 

meaning . 


h.        The  sclent or  in  defined  as  a  selector  statement,  SS, 
enclosed  in  brackets: 

[SS]  (1) 

The  selector  statement  is  either  a  conditional  statement: 

L#R  (2) 

or  a  quantified  statement 

Q(L#R)  •  (3) 

where 

L  -  called  the  left  part  of  the  conditional  statement  or 
the  referee,  is  either  a  VL2  formula  (see  point  5)  or 
a  form  which  can  be  described  as  a  quantifier -free 
FOPL  formula  over  atomic  forms.   It  will  be  assumed 
for  the  purpose  of  this  paper  that  this  FOPL  formula 
is  in  a  disjunctive  normal  form,  and  that  or  is  denoted 
by  ', x ,    and  by  '.'  and  negation  by  a  bar  over  the 
predicate  symbol.  For  example,  a  FOPL  formula 

P1(x,f(y))A^P2(y)VP5(x,y,c)  (k) 

where 

P1(x,f(y)),p2(y),p_(x,y,c)  —  atomic  forms 

x,y  —  variables,  f(y)  -  a  function  of  y 

c  —  a  constant 
is  written  as 

P1(x,f(y))'p"2(y),p5(x,y,c)  (5) 

#  denotes  '='  or  '^' 

R  -  called  the  r~j  ght  part  of  the  conditional  statement  or 
reference,  is  a  subset  of  the  union  of  the  domains  of 
atomic  forms  in  L,  or  a  VIo  formula. 

Q  -  a  sequence  of  existential,  3xj_,  and/or  universal,  Vxi> 
quantifier  forms,  where  Xj_  are  variables  in  atomic 
forms  of  L. 

Examples  of  a  selector: 

[p(x,y)  ---  3]  (6) 

[p1(x,a).p2(y,z)  =  2,4]  (7) 


[3x,vy(p1(x,y,b)vp2(y,c)  =  0,2)]  (8) 

The  selector  in  which  SS  is  a  conditional  statement  is 
called  a  conditional  selector  (e.g.,  (6)  and  (7)),  else  it 
is  called  a  quantified  selector  (e.g.,  (8)).  A  conditional 
selector  [L  #  R]  in  which  the  referee  L  is  a  single  atomic 
form  Pj_  and  the  reference  R  is  a  subset  of  its  domain,  is 
called  a  simple  selector. 

A  simple  selector  [P^  =  R]  ([Pj_  ^  R] )  is  said  to  be 
satisfied,  iff  the  value  of  the  atomic  form  P^  is  (is  not) 
an  element  of  R.   If  P,  P]_  and  P2  are  atomic  forms  then: 

[P  =  R]  is  satisfied,  iff  [P  ^  R]  is  satisfied 

[P  ^.R]  is  satisfied,  iff  [P  =  R]  is  satisfied 

[P  «P  #  R]  is  satisfied  iff  [P-j_  #  R]  and  [P2  #  R]  are 
satisfied 

[P-pPp  #  R]  is  satisfied  iff  [Px  #  R]  or  [P2  #  R]  is 
satisfied 

[(3x)  (P  #  R)]  is  satisfied  iff,  for  given  values  of  all 
free  variables  in  P  (i.e.,  variables  other 
than  x),  there  exists  a  value  of  x  which 
satisfies  the  selector  [P  #  R] 

[(yx)  (P  #  R)]  is  satisfied,  iff,  for  given  values  of  free 

variables,  the  selector  [P  #  R]  is  satisfied 
for  all  values  of  x. 

5.   A  Vl£  formula  is  defined  by  the  following  rules: 

(i)  an  element  of  the  output  domain  D  or  a  selector 
standing  alone  is  a  VL2  formula, 

(ii)  if  V,  V]_  and  Vg  are  VI^  formulas  then  so  are: 

i(V)    called  the  inverse  of  V 

V-i  A  Vp   (written  also  V]V2)  called  the  conjunction 
of  V;|_  and  V2 ' 

V,  V  V?  called  the  disjunction  of  V-j_  and  V2 . 

A  Vlrj  formula  in  the  form  of  a  disjunction  of  terms, 
where  term  is  a  conjunction  of  selectors  and  an  element  of 
D,  is  called  a  disjunctive  simple  VI^  formula  and  denoted 
as  DVIg. 


A  VLg  formula  which  includes  only  conditional  selectors 
is  called  a  conditional  or  quanti fi or -free  formula.   In 
what  follows  we  will  discuss  only  conditional  VL^  formulas. 

6.   Each  VLo  formula  V  is  assigned  a  value  v(V)eD  depending  on 
the  values  of  atomic  forms  in  it: 

(i)  The  value  of  an  clement  of  D  standing  alone  is 
this  element  itself. 

(ii)   The  value  of  a  selector  is  the  largest  element  of 
D,  if  the  selector  is  satisfied,  otherwise  the 
smallest  element  of  D. 

(iii)   If  the  value  V  is  the  k-th  smallest  element  of 
D,  then  the  value  of  the  inverse  -i(v)  is  the 
k-th  largest  element  of  D. 
V2.V2   is  assigned  the  smaller  of  the  values  of 

V]_  and  V2 
V-^V  VV)  is  assigned  the  larger  -of  the  values  of 

V]_  and  V>>« 

For  illustration,  below  is  an  example  of  a  VL2  formula 
and  its  interpretation: 

^[p.1(x1,x2).p2(x2,x^)^medium][p^=true]  \j  3Cp*= unknown]  V 

l[p1+(x2,xJ+)=yellowJ>red]  (9) 

Suppose  that  the  domains  of  atomic  forms  p-j_(x-j_, x2),  P2(x2,x*), 
P3>  Plj. (xq)  x^ )  are,  respectively,  D-jj=D2=  { small, medium,  large] , ' 
D^={ unknown,  false,  true]  and  Dl^={white,  yellow,  blue,  red, black]  . 
And  that  the  output  domain  of  the  formula  (9)  is  D={0, 1, 2, 3,h), 
ordered  as  indicated  by  numbers. 

The  formula  (9)  is  assigned  value  (has  value)  h,    iff  atomic 
forms  Pi(x]_, X2)  and  P2(x2,xi|.),  for  given  values  of  X]_,  xg  and  xx 
take  value  not  equal  'medium',  and  px  takes  value  'true'.   If 
the  above  condition  is  not  satisfied,  and  px  takes  value  'unknown', 
then  (9)  has  value  3*   If  both  of  the  above  conditions  do  not  hold 
and  pl4.(x2,xi(.),  for  given  values  X2,xl+,  takes  value  'yellow'  or 
'red',  then  (9)  has  value  1.   If  none  of  the  above  conditions 
hold,  (9)  has  value  0. 


BASIC  CONCEPTS  UNDERLYING  INDUCTIVE  INFERENCE  BY  MEANS  OF  VLg 

The  subject  of  inductive  inference  by  means  of  the  VL2  system 
is  very  broad.   For  the  limitation  of  space,  we  will  only 
delineate  some  of  its  major  concepts. 


Suppose  that  the  domains  of  all  atomic  forms  in  a  VL2 
formula  are  D]_, D2,  .  .  «,Dn.   The  set  of  all  possible  sequences 
of  values  of  atomic  forms,  that  is  set  D^xDgX  . . .  x  Dn,  is  called 
an  event  space  of  the  formula,  and  its  elements  are  called  event s ♦ 
The  event  space  of  a  formula  V  is  denoted  by  E(v).   If  the  output 
domain  of  V  is  set  D,  then  V  expresses  a  function 

f:   E(V)  »  D  (10) 

The  atomic  forms  in  a  VLg  formula  denote  functions  of  the 
similar  type,  namely  an  atomic  form  Pi(x]_,  x>>)  denotes  a  function 

p.:   D   XD   -D  (11) 

11   lg   1 

where  Dj_  is  the  domain  of  -p^ix^y^)   and  Vi      and  Dj   domains  of 
xi  and  X2,  respectively. 

The  atomic  forms,  however,  do  not  express  the  functions  (ll), 
they  only  denote  their  names  and  arguments.  For  further 
considerations  we  will  make  a  simplifying  assumption,  that  these 
functions  are  fixed  and  can  be  computed  for  any  given  values  of 
their  input  variables. 

Let  V"i  and  V2  be  two  VL2  formulas  having  comparable  sets  of 
atomic  forms*  (i.e.,  one  set  includes  or  is  equal  to  another  set). 
And  let  E  be  a  subset  of  the  event  space  E,  specified  by  domains 
of  the  larger  of  the  two  sets  of  atomic  forms.   Formulas  V]_  and  V2 
are  called  semantic ally  E -equivalent,  which  we  write 

V-l   =   V2  (12) 


iff  for  every  eeE 


v(Vx)  =  v(V2)  (13) 


If  E  =  E,  then  V-j_  and  V2  are  called  semantically  equivalent 
and  we  write  V]_  =   v>>.  A  rule  which  transforms  one  formula  into 
another,  semantically  equivalent  formula,  is  called  an  equivalence - 
preserving  transformation  rule.  Below  are  given  examples  of  such 
rules  (read  '='  as:   'the  formula  on  the  left  side  may  be  replaced 
by  the  formula  on  the  right  side')*  Assume  that  V  is  an  arbitrary 
VLp  formula;  Bj_,  P]_,  P2  are  atomic  forms;  R]_,  R2  £  Dj_  (domain  of 
Pj_),  and  R  c  Di  =  D2  (domains  of  P]_  and  P2). 


The  atomic  formulas  are  here  considered  equal  if  they  represent 
functions  which  differ  only  in  that  some  of  their  arguments  are 
substituted  by  a  value  from  the  domains  of  the  arguments. 


V[P.    =  R1]VV[Pi  =  R2]    s  V[P.    =  RxUiy  {Ik) 

v[r.  /  R1]vv[p.  /  R2]  -  v[p.  f.  \^\]  (15) 

If  R^U  Bg  =  D±   and  P^fll^  =■.  </>   (empty  set)  then  (ik)   and  (15)  reduce 
to  (16)  and  (17): 

V[Pi  =  Ri^V[P.  =  R2]  =  V  (16) 

V[P.  /  R]_]\  <[t\    4   R2]  =Y  (17) 

V[P]_  -  R]W[P2  =  R]  =  V[P1,P2  =  R]  (18) 

V^  =   R][P2  =  R]  =  V[\'?2  =   R]  (19) 

V[P1  =  R][P2  ^  R]  s  V[P1'P2  =  R]  (20) 

Suppose  now  that  the  output  domain  of  a  DVL2  formula  V  is  a 
set  D  whose  smallest  element  is  *.   Suppose  further  that  all 
elements  of  D,    except  *t    denote  certain  'specified  decisions' 
about  events,  and  element  *  denotes  an  'unspecified  decision'. 
Let  E+  and  E*  denote  subsets  of  E(v)  for  which  V  takes  specified 
and  unspecified  decisions,  respectively. 

Events  of  E+  are  those  which  satisfy  at  least  one  term  in  V 
(i.e.,  satisfy  all  selectors  in  the  term),  while  E*  are  the 
remaining  events  in  E,  i.e.,  E*=E(v)\E+.   We  will  call  the  set  E+ 
a  set  of  recognizable  events  of  V  and  E*  a  set  of  not -recognizable 
events  of  V.  Elements  of  E*  will  be  called  *- events. 

Let  V]_  be  a  VL2  formula  and  E^  its  set  of  recognizable 
events. 

A  rule  which  transforms  the  formula  Vn  into  a  new  formula 
V2  (whose  set  of  recognizable  events  is  E£),  is  called  a  deductive 
inference  rule  (DR)  if 

e£ 

V^^  =  V2  and  E+  c  E+  (21) 

and  is  called  an  inductive  inference  rule  (IR)  if 

E+  ~ 
V1     =     V2  and  E2  3  E+  (22) 


According  to  (22),  a  rule  is  an  IR,  iff  Y2   makes  the  same 
specified  decisions  as  Vj_  for  events  of  Ej,  but,  also,  makes 
specified  decisions  for  some  other  events  than  Ej.  A  question 
arises  of  how  these  'other'  events  should  be  selected  and  what 
decisions  should  be  made  about  them.   To  answer  this  question,  a 


criterion  governing  an  inductive'  rule  is  needed.  We  accept  a 
criterion  which  can  be  characterized  as  a  'criterion  of 
simplicity'.   That  is,  we  design  a  'simplicity  functional'  for 
VL2  formulas  (which  can  be  modified  according  to  application)  and 
employ  inductive  rules  which  maximize  the  assumed  functional. 

An  important  inductive  rule  of  this  type  is  the  one  which 
assigns  to  ^-events  such  decisions  which  permit  one  to  apply  to  a 
given  formula  rules  (ll|-)-(20)  whenever  it  could  lead  to  the 
simplification  of  the  formula  according  to  the  accepted  measure 
of  simplicity  (which,  at  the  same  time,  means  a  generalization  of 
the  formula). 

An  inductive  program,  called  AQVAL/l,  which  operates  on  such 
principles,  has  been  developed  at  the  University  of  Illinois  and 
already  experimentally  applied  to  selected  learning  and  recognition 
problems  from  the  area  of  medicine5  and  plant  pathology  (the 
current  version  of  AQVAL/l  implements  a  subset  of  VL2  called  VL]_). 
It  should  be  mentioned  that  problems  of  inductive  learning  by 
means  of  variable -valued  logic  have  a  strong  relationship  to  the 
problems  of  grammatical  inference. ' 


DESCRIBING  OBJECTS  IN  TERMS  OF  VL2 

In  the  application  of  VL2  to  describing  objects,  atomic  forms 
are  used  to  represent  certain  functions  called  descriptors. 
Descriptors  are  functions  which  a  learning  system  uses  to  describe 
objects. 

Let  pi  denote  a  descriptor: 

p,  :    X  D.  .  -  D.  (23) 

where 

X   denotes  the  cartesian  product 
J  =  {1,2,  ...,k} 

D-ji  -  input  domains  of  the  descriptor 
D^  -  the  output  domain  of  the  descriptor 

Special  cases  of  a  descriptor: 

1.  J  =  {1},  i.e.,  p^  is  a  unary  function.   If  D.   denotes  a  set 
of  objects,  and  Dj_  a  set  of  the  values  of  a    specific 
characteristic  of  the  objects,  then  j>±   is  called  a  feature. 

2.  J=  (1,2,  ...,k},  k=2,3,  ...,  Di;L=Di2=...Dir,  Dj  =  {  true,  false} 
If  Dji  denotes  a  set  of  objects,  then  pi  can  be  interpreted 
as  a  k-ary  relation  among  these  objects.   If  Pj_(0i-j_, Oj_  ,  . . ., 


°ii  )-truo,  then  we  say  that  the  relation  among  0±.f0^nf  . .  .,0j, 
holds,  othervrisc  does  not  hold.   If  Dj  is  not  a  binary-valued' 
set,  but  has  a  finite  number  of  values,  then  we  will  say  that 
Pj_  i  s  a  wultj  -valued  1s-ary  relation. 

As  we  can  sec  a  descriptor  has  a  very  broad  meaning. 

Example 

Suppose  D.   =  D^  denote  a  set  of  parts  of  a  certain  physical 
object.   To  express  a  fact  that,  e.g.,  a  relation  'above'  holds 
between  certain  parts  of  the  object,  we  can  use  a  function: 

ABOVE:   D.   x  D.   -*  {true,  false}  (2k) 

Xl    X2 

If  the  relation  'above'  holds  between  0]_  and  02  we  write 
[ABOVE ( Op  02)  =  true],  or,  since  the  output  domain  is  just  binary, 
simply  ABOVE  (0p02).   Suppose,  however,  that  we  want  to  distinguish 
between  3  possibilities:  not  above,  little  above,  much  above.   In 
this  case  we  assume  that 

D1  =  [not,  little,  much}  (25) 

To  express  the  fact  that  Oi  is  much  above  02,  we  use  a 
selector 

[ABOVE (Op  02)  =  much]  (26) 

If  in  describing  a  class  of  objects  we  observe  that  the  part  0]_  is 
either  much  above  or  not  above  the  part  02,  we  would  write: 

[ABOVE (0X, 02)  =  not,  much]  (27) 


In  describing  individual  objects  we  can  distinguish  the 
following  classes  of  descriptors: 

1.  Global,  O-level,  descriptors. 

These  are  features  which  characterize  objects 
as  a  whole  (e.g.,  color,  size,  texture,  length, 
etc.) 

2.  Local  1-level  descriptors  which  characterize 
basic  (l-level)  parts  and  k-ary,  k=2,3,  ••• 
relationships  among  them. 

3.  Local  k-level,  k=2, 3, •••>  descriptors  which 
characterize  k-level  parts  and  relationships 
among  parts  of  the  k-1  level  parts. 


AN  EXAMPLE  OF  IEAKNING  THE  SIMPLEST  DESCRIPTION  OF  THE 
DIFFERENCE  BETWEEN  TWO  CLASSES  OF  OBJECTS 

Suppose  we  want  to  develop  a  machine  which,  given  examples 
of  objects  from  certain  classes,  could  learn  the  simplest 
(according  to  some  defined  criteria)  description  of  the  object 
classes  or  the  differences  between  classes.   Let  us  assume  that 
the  machine  has  already  built-in  certain  elementary  abilities, 
such  as  the  ability  to  recognize  a  triangle  or  rectangle,  to 
measure  their  size  and  orientation,  to  determine  various  relation- 
ships between  the  recognized  oujects,  e.g.,  a  relation  'on  top  of, 
'in  between',  etc.   The  problem  of  implementing  the  abilities  of 
this  type  is  quite  difficult  by  itself.   Though,  there  have 
already  been  developed  computer  programs  which  can,  to  a  limited 
degree,  measure  the  descriptors  of  the  kind  described  above  (see, 
e.g.,  Winston^).   It  is  important  to  observe,  however,  that  the 
number  of 'such  elementary  descriptors, which  potentially  may  be 
needed  is  not  very  large,  and  therefore  each  of  them  could  be 
implemented  by  a  specially  designed  software  or  hardware  device. 
On  the  other  hand,  the  number  of  potential  combinations  of  these 
descriptors,  which  may  occur  in  descriptions  of  real  objects,  is 
prohibitively  large.   Therefore,  an  important  problem,  to  which 
we  are  addressing  ourselves  is  how  to  implement  very  efficient 
inference  and  learning  processes  which  create  goal  oriented 
descriptions  of  objects  or  object  classes,  assuming  that  these 
elementary  descriptors  are  available.   This  type  of  problem  is 
illustrated  by  the  following  example. 

Fig.  1  presents  two  classes  of  'TABLES'.   The  objective  is 
to  implement  a  learning  process  which  would  produce  the  simplest 
description,  with  regard  to  an  assumed  simplicity  functional,  of 
the  difference  between  these  two  classes  of  TABLES. 

Suppose  that  the  following  descriptors  and  their  domains  are 
used  to  describe  the  TABLES: 

1.  global  descriptors:  length,  {short, long] 

#  parts,  13,10 

2.  a)     features  of  individual  parts   Pj_,    i=l, 2,3,^,    (top 

rectangle,    left  triangle,    right  triangle,   bar): 

part-type  (Pi),      {0,  □,  7,^,  =  ) 

part -length  (Pj_),   {0,  short,  long} 

part-texture  (P^,  {0,  ©,©,©,©} 

(0  means  'not  relevant'  -  when  a  part  does  not  exist) 

b)  binary  relations  among  parts  on-top: (P^ ,  P^ ), 
{above -middle,  above-left,  ebove^rightj 


c)  ternary  relations  among  parts: in-between (Pi,P-<,Pv), 

(low,  high]  (part  I1^  is  between  \\-   and  l\) . 

Using  these  descriptors,  the  machine  describes  each  object 
in  terms  of  the  VJj2  system,  as  a  conjunction  of  selectors.  For 
example,  object  1  in  class  1  would  be  described  as: 

f length- short! [#parts=4] [part-type (P^ )=|  1 1 [ part -type (Po )=  7  ] 

[  part -type  (P,)=  ^  ][part-type(P^)=  t=a][  part -length  (P]_)  =  short] 

[ on-top (Pp  P2)=above-right][ in-between (P^,  P2,  P^high]   (28) 

Suppose  that  T^_,  T^  ,  T.j_,  and  Tj,  denote  the  descriptions  of 
objects  1,2,3,4  in  class  i,  1=1,2,  respectively.   A  description 
of  the  class  1  (which  is  the  'least  general')  could  then  be: 

CLASS1(T11V  T^VT^VT^)  (29) 

and  of  class  2: 

CLASS2(T21V  T22VT23VT2ij.)  (30) 

where {#CIASS1,  CIASS2}  is  the  output  domain  of  the  formulas. 

Events  which  do  not  satisfy  any  of  these  formulas  are  *-events. 
Suppose  now  that  as  a  simplicity  criterion  we  accept  a  criterion 
demanding  that  a  formula  has  the  minimum  number  of  terms,  and,  with 
the  secondary  priority,  the  minimum  number  of  selectors. 

A  way  to  attain  the  simplest,  in  the  above  sense,  description 
of  the  difference  between  the  two  classes,  is  to  maximally  simplify 
and  generalize  the  formulas  (29)  and  (30)  under  the  restriction 
that  the  resulting  formulas  will  have  the  empty  intersection. 
(The  'empty  intersection'  means  that  there  will  be  no  events  which 
satisfy  both  formulas.)  This  is  done  by  assigning  to  ^-events  such 
decisions  which  lead  to  the  maximal  simplification  and  generalization 
of  formulas  (29)  and  (30)  by  using  rules  (l^)-(20)  (without 
violating  the  above-mentioned  restriction).   Such  an  inductive 
process  can  be  very  efficiently  executed  by  the  previously  mentioned 
computer  program  AQVAL/l.   The  simplest  formulas,  according  to  our 
criterion,  for  both  classes  obtained  from  the  AQVAL/l  were: 

CLASSl[length=short][part-texture(ri|)=  ([),©  ]       (31) 

C  LASS2[  lengths  long  ]V  [part-texture  (P.  )=  0  ,@  ]      (32) 

(the  execution  time  was  less  than  3  sec.  on  the  IBM  3^0/75; 
AQVAL/l  is  written  in  PL/l). 


These  formulas  state  that  TABLES  of  class  1  are  'short*  and 
the  texture  of  the  bar  is  ({]])  or  @  ,  and  that  TABLES  of  class  2 
are  either  long  or  the  texture  of  the  "bar  is  <^h    or  there  is  no 
bar. 

This  description  of  the  classes  seem  to  agree  well  with  what 
a  human  might  accept  as  a  'most  simple*  difference  between  the  two 
classes. 
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